44 research outputs found

    Hidden Semi Markov Models for Multiple Observation Sequences: The mhsmm Package for R

    Get PDF
    This paper describes the R package mhsmm which implements estimation and prediction methods for hidden Markov and semi-Markov models for multiple observation sequences. Such techniques are of interest when observed data is thought to be dependent on some unobserved (or hidden) state. Hidden Markov models only allow a geometrically distributed sojourn time in a given state, while hidden semi-Markov models extend this by allowing an arbitrary sojourn distribution. We demonstrate the software with simulation examples and an application involving the modelling of the ovarian cycle of dairy cows.

    Hidden Semi Markov Models for Multiple Observation Sequences: The mhsmm Package for R

    Get PDF
    This paper describes the R package mhsmm which implements estimation and prediction methods for hidden Markov and semi-Markov models for multiple observation sequences. Such techniques are of interest when observed data is thought to be dependent on some unobserved (or hidden) state. Hidden Markov models only allow a geometrically distributed sojourn time in a given state, while hidden semi-Markov models extend this by allowing an arbitrary sojourn distribution. We demonstrate the software with simulation examples and an application involving the modelling of the ovarian cycle of dairy cows

    Multicohort analysis of the maternal age effect on recombination

    Get PDF
    Several studies have reported that the number of crossovers increases with maternal age in humans, but others have found the opposite. Resolving the true effect has implications for understanding the maternal age effect on aneuploidies. Here, we revisit this question in the largest sample to date using single nucleotide polymorphism (SNP)-chip data, comprising over 6,000 meioses from nine cohorts. We develop and fit a hierarchical model to allow for differences between cohorts and between mothers. We estimate that over 10 years, the expected number of maternal crossovers increases by 2.1% (95% credible interval (0.98%, 3.3%)). Our results are not consistent with the larger positive and negative effects previously reported in smaller cohorts. We see heterogeneity between cohorts that is likely due to chance effects in smaller samples, or possibly to confounders, emphasizing that care should be taken when interpreting results from any specific cohort about the effect of maternal age on recombination

    A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness

    Get PDF
    Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics

    Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics

    Get PDF
    Background: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. Results: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. Conclusion: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network

    Atrial fibrillation genetic risk differentiates cardioembolic stroke from other stroke subtypes

    Get PDF

    Statistical methods for genotype microarray data on large cohorts of individuals

    No full text
    Genotype microarrays assay hundreds of thousands of genetic variants on an individual's genome. The availability of this high throughput genotyping capability has transformed the field of genetics over the past decade by enabling thousands of individuals to be rapidly assayed. This has lead to the discovery of hundreds of genetic variants that are associated with disease and other phenotypes in genome wide association studies (GWAS). These data have also brought with them a number of new statistical and computational challenges. This thesis deals with two primary analysis problems involving microarray data; genotype calling and haplotype inference. Genotype calling involves converting the noisy bivariate fluorescent signals generated by microarray data into genotype values for each genetic variant and individual. Poor quality genotype calling can lead to false positives and loss of power in GWAS so this is an important task. We introduce a new genotype calling method that is highly accurate and has the novel capability of fusing microarray data with next-generation sequencing data for greater accuracy and fewer missing values. Our new method compares favourably to other available genotype calling software. Haplotype inference (or phasing) involves deconvolving these genotypes into the two inherited parental chromosomes for an individual. The development of phasing methods has been a fertile field for statistical genetics research for well over ten years. Depending on the demography of a cohort, different phasing methods may be more appropriate than others. We review the popular offerings and introduce a new approach to try and unify two distinct problems; the phasing of extended pedigrees and the phasing of unrelated individuals. We conduct an extensive comparison of phasing methods on real and simulated data. Finally we demonstrate some preliminary results on extending methodology to sample sizes in the tens of thousands.This thesis is not currently available in ORA

    Targeting Human MicroRNA Genes Using Engineered Tal-Effector Nucleases (TALENs)

    Get PDF
    <div><p>MicroRNAs (miRNAs) have quickly emerged as important regulators of mammalian physiology owing to their precise control over the expression of critical protein coding genes. Despite significant progress in our understanding of how miRNAs function in mice, there remains a fundamental need to be able to target and edit miRNA genes in the human genome. Here, we report a novel approach to disrupting human miRNA genes <i>ex vivo</i> using engineered TAL-effector (TALE) proteins to function as nucleases (TALENs) that specifically target and disrupt human miRNA genes. We demonstrate that functional TALEN pairs can be designed to enable disruption of miRNA seed regions, or removal of entire hairpin sequences, and use this approach to successfully target several physiologically relevant human miRNAs including miR-155*, miR-155, miR-146a and miR-125b. This technology will allow for a substantially improved capacity to study the regulation and function of miRNAs in human cells, and could be developed into a strategic means by which miRNAs can be targeted therapeutically during human disease.</p></div

    Schematic of the miR-155/miR-155* genomic locus and the location of the TALEN pair engineered to target the miR-155* region.

    No full text
    <p>(A) Schematic of the miR-155/miR-155* genomic locus. The three BIC exons are shown in yellow and the miR-155 hairpin is in red. (B) Schematic of the miR-155 hairpin structure. The miR-155 arms are shown in black, while the mature miR-155 and miR-155* sequences are in dark grey. Blue and green boxes represent the binding sites of the TALEN pair designed to target the miR-155* region, and hexagons represent the heterodimerized FokI enzyme positioned over the spacer sequence. (C) The two expression plasmids containing the TALEN pair along with the FokI nuclease domain, and the TALEN-RVD sequences corresponding to each targeted DNA sequence are shown. The details of pCS2TAL3-DDD and pCS2TAL3-RRR expression vectors are described in the Materials and Methods section. NN, HD, NG and NI represent the RVD regions of each repeat sequence that bind to nucleotide G, C, T and A, respectively. The left and right TALEN binding sequences are shown in red and purple, respectively, and the spacer region is in blue.</p
    corecore